Introduction

This document aims to compare the genomic informations provided by:
- BWr: The release 28 of Ensembl concerning the Bread Wheat Transcriptome (IWGSC for fasta + Popseq American for physical position) - ADr: Assaf genomic data comming from Dicoccoides

We have a genetic map containing 16000 Markers. We are going to compare genetic order of contig with physical order provided by these 2 references.

Charge some libraries that will be useful

library(tidyverse)
library(plotly)

Please tell were are stored data:

path="/Users/yan/Dropbox/Publi_Fusariose/ANALYSIS_REPRO/DATA/HassaF_DATA_ISRAEL/"

And load data

#Watch out, to reproduct analysis, you have to update the path.

# Genetic map
map=read.table("/Users/yan/Dropbox/Publi_Fusariose/ANALYSIS_REPRO/DATA/map_avec_posi_physique.txt" , header=T )[,c(1:3)]
colnames(map)=c("chromo_map","marker","position_map")

# Assaf Data ADr
ADr=read.table(paste(path,"hc_genes_info.tab",sep=""), header=T, sep=",")

# BWr
BWr=read.table("/Users/yan/Dropbox/Publi_Fusariose/ANALYSIS_REPRO/DATA/physical_map_of_BW.txt", na.string="-" , header=T , dec=".")
colnames(BWr)=c( "chromo" ,  "contig" ,"position_BWr" )
BWr=BWr %>%  filter(!grepl("D",chromo)) %>% droplevels()

1/ Genetic Map data

The genetic map used has been created for the Dic2 x Silur RILs population. SNP Markers comes from RNA-Seq + Capture (~16000 Markers). Reads have been maped on the BWr.

Let’s build a figure that summarize main features of this map

my_text=map %>% group_by(chromo_map) %>% summarise(Length = max(position_map), Nb=length(position_map)) %>% mutate(text = paste(chromo_map, " | ", Length, " cM | ", Nb, " SNPs")) %>% select(text)
ggplot(map, aes(x=position_map)) + 
  geom_density(aes(fill=chromo_map)) + 
  theme(legend.position = "none") + 
  facet_wrap(~chromo_map, labeller=labeller(.chromo_map=my_text))

2/ BWr: Ensemble release 28

The Bread Wheat Reference (BWr) hase been found on Ensembl release 28. The physical positions are provided: * For the 3B it is provided by the IWGSC, very precise * For other chromosomes, it is provided by PopSeq experiment (Americans), Pseudophysical positions.

Let’s summarize the features of this genomic reference?

# Densité en gène le long des chromosomes?
BWr %>% mutate(pos=position_BWr/1000000) %>% 
  ggplot( aes(x=position_BWr)) + 
    geom_density(aes(fill=chromo, color=chromo, alpha=0.8)) + 
    facet_wrap(~chromo, scales="free") + 
    theme(legend.position="none", axis.text=element_blank() , axis.ticks= element_blank() ) + 
    xlab("position in Mb")

3/ ADr: Assaf Data

These data have been recovered here. They have been stored on CC2 here:

/gs7k1/projects/g2pop/HOLTZ_YAN_DATA/BREAD_WHEAT_IWGSC_and_HORDEUM/DATA_ASSAF_ISRAEL
# Change chromosome names
ADr$seqid=gsub("chr","",ADr$seqid)
ADr=ADr[which(ADr$seqid!="Un"),] %>% droplevels

# Taille de chaque chromosome?
ADr %>% group_by(seqid) %>% summarise(Value = max(end)) %>% 
    ggplot(aes(x=seqid, y=Value)) + geom_bar(stat="identity", fill=rgb(0.8,0.4,0.6,0.7)) + xlab("") + ylab("length of chromosomes")

# Distribution de la taille des gènes?
ggplot(ADr, aes(x=end-start)) + geom_histogram(fill=76) +  xlab("Genes size (pb)")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Densité en gène le long des chromosomes?
ADr %>% mutate(pos=(end+start)/2/1000000) %>% ggplot( aes(x=pos)) + geom_density(aes(fill=seqid, color=seqid, alpha=0.5)) + facet_wrap(~seqid) + theme(legend.position="none") + xlab("position in Mb")